Goto

Collaborating Authors

 different part


AI Is Moving Beyond Chatbots. Claude Cowork Shows What Comes Next

TIME - Tech

AI Is Moving Beyond Chatbots. The DNA file had been gathering dust in Pietro Schirano's computer for years. Then, earlier this month, he gave it to Claude Code--an "agentic coding tool" developed by Anthropic--for analysis. "I'm attaching my raw DNA file from Ancestry DNA," he told the tool. The AI spawned copies of itself on Schirano's computer, each one simulating an expert in a different part of the genome--one expert on cardiovascular disease, another on aging, a third on autoimmune disease.


Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators

arXiv.org Artificial Intelligence

ELOQUENT is a set of shared tasks that aims to create easily testable high-level criteria for evaluating generative language models. Sensemaking is one such shared task. In Sensemaking, we try to assess how well generative models ``make sense out of a given text'' in three steps inspired by exams in a classroom setting: (1) Teacher systems should prepare a set of questions, (2) Student systems should answer these questions, and (3) Evaluator systems should score these answers, all adhering rather strictly to a given set of input materials. We report on the 2025 edition of Sensemaking, where we had 7 sources of test materials (fact-checking analyses of statements, textbooks, transcribed recordings of a lecture, and educational videos) spanning English, German, Ukrainian, and Czech languages. This year, 4 teams participated, providing us with 2 Teacher submissions, 2 Student submissions, and 2 Evaluator submissions. We added baselines for Teacher and Student using commercial large language model systems. We devised a fully automatic evaluation procedure, which we compare to a minimalistic manual evaluation. We were able to make some interesting observations. For the first task, the creation of questions, better evaluation strategies will still have to be devised because it is difficult to discern the quality of the various candidate question sets. In the second task, question answering, the LLMs examined overall perform acceptably, but restricting their answers to the given input texts remains problematic. In the third task, evaluation of question answers, our adversarial tests reveal that systems using the LLM-as-a-Judge paradigm erroneously rate both garbled question-answer pairs and answers to mixed-up questions as acceptable.


Cross-modal Information Flow in Multimodal Large Language Models

arXiv.org Artificial Intelligence

The recent advancements in auto-regressive multimodal large language models (MLLMs) have demonstrated promising progress for vision-language tasks. While there exists a variety of studies investigating the processing of linguistic information within large language models, little is currently known about the inner working mechanism of MLLMs and how linguistic and visual information interact within these models. In this study, we aim to fill this gap by examining the information flow between different modalities -- language and vision -- in MLLMs, focusing on visual question answering. Specifically, given an image-question pair as input, we investigate where in the model and how the visual and linguistic information are combined to generate the final prediction. Conducting experiments with a series of models from the LLaVA series, we find that there are two distinct stages in the process of integration of the two modalities. In the lower layers, the model first transfers the more general visual features of the whole image into the representations of (linguistic) question tokens. In the middle layers, it once again transfers visual information about specific objects relevant to the question to the respective token positions of the question. Finally, in the higher layers, the resulting multimodal representation is propagated to the last position of the input sequence for the final prediction. Overall, our findings provide a new and comprehensive perspective on the spatial and functional aspects of image and language processing in the MLLMs, thereby facilitating future research into multimodal information localization and editing.


Towards Kinetic Manipulation of the Latent Space

arXiv.org Artificial Intelligence

The latent space of many generative models are rich in unexplored valleys and mountains. The majority of tools used for exploring them are so far limited to Graphical User Interfaces (GUIs). While specialized hardware can be used for this task, we show that a simple feature extraction of pre-trained Convolutional Neural Networks (CNNs) from a live RGB camera feed does a very good job at manipulating the latent space with simple changes in the scene, with vast room for improvement.


Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers

arXiv.org Artificial Intelligence

Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate distributions that are better than a flat learning rate. We combine the learning rate distributions thus found and show that they generalize to better performance with respect to the problem of catastrophic forgetting.


Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

arXiv.org Artificial Intelligence

Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.


Watch the moment a computer reads a patient's MIND

Daily Mail - Science & tech

It's probably a good idea to keep your opinions to yourself if your friend gets a terrible new haircut - but soon you might not get a choice. That's because scientists at the University of Texas at Austin have trained an artificial intelligence (AI) to read a person's mind and turn their innermost thoughts into text. Three study participants listened to stories while lying in an MRI machine, while an AI'decoder' analysed their brain activity. They were then asked to read a different story or make up their own, and the decoder could then turn the MRI data into text in real time. The breakthrough raises concerns about'mental privacy' as it could be the first step in being able to eavesdrop on others' thoughts.


Updated brain map reveals how we control the movement of our bodies

New Scientist

Our movements may be controlled by two distinct networks in our brain, rather than just one. For nearly a century, we have known that the motor cortex – a relatively thin strip of tissue in the centre of the brain that runs across both hemispheres – controls our body movements. In the 1930s, neuroscientists Wilder Penfield and Edwin Boldrey electrically stimulated the brains of people undergoing brain surgery, showing that different parts of the primary motor cortex control different parts of the body. They also found that these control areas are arranged in the same order as the body parts they direct, with the toes at one end and the face at the other, as depicted by the so-called homunculus map. Evan Gordon at Washington University School of Medicine in Missouri and his colleagues wanted to use modern technology to look into the Penfield-Boldrey idea in more detail.


Meta's New AI Tool Makes It Easier For Researchers To Analyze Photos

#artificialintelligence

The AI based tool can create "cutouts" or segments of different parts of an image. This comes handy while editing photos or while analyzing imagery for biological or security purposes. These tasks have one thing in common: you need to be able to identify and separate different objects within an image. Traditionally, researchers have had to start from scratch each time they want to analyze a new part of an image. Meta aims to change this laborious process by being the one-stop-shop for researchers and web developers working on such problems.


Machine Translation with Attention in TensorFlow Python from Scratch

#artificialintelligence

Sequence to Sequence (Seq2Seq) models have been used extensively in various Natural Language Processing (NLP) tasks such as machine translation, text summarization, and question answering. In this blog post, we will implement a Seq2Seq model for Italian-to-English machine translation using TensorFlow and Python OOPs. The model architecture will consist of an Encoder, a Decoder, and an Attention mechanism. The first step in any machine learning task is to preprocess the data. We will be using a dataset of Italian-English sentence pairs for our translation task.